Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAJAL interface #160

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

CAJAL interface #160

wants to merge 3 commits into from

Conversation

schlegelp
Copy link
Collaborator

@schlegelp schlegelp commented Sep 10, 2024

[WIP] This PR provides an interface with CAJAL. Details and evaluation to follow but it already looks very useful even after just a first glance.

  • conversion for TreeNeurons
  • conversion for MeshNeurons
  • conversion for VoxelNeurons (not sure that even makes sense)
  • conversion for Dotprops: this would be cool but also needs a custom implementation for CAJAL's sampling
  • Gromov-Wasserstein (GW) distance calculation
  • quantized Gromov-Wasserstein (QGW) distance
  • second lower bound (SLB) distance calculation
  • combined SLB + QGW distance calculations
  • neuron averaging (check if we can can use a landmark transform)
  • integration of (multi-modal) meta data

Notes:

  • right now this is implemented as an interface (i.e. something that needs to be explicitly imported) but we could also roll this into the top-level namespace as e.g. navis.cajal_dist
  • CAJAL itself only does pairwise all-by-all calculations but it would be nice to be able to run sets of query vs sets of target neurons; maybe we can make a PR for that

@schlegelp
Copy link
Collaborator Author

schlegelp commented Sep 12, 2024

Background

CAJAL takes a skeleton and selects N uniformly (?) sampled points. Here is an example for N of 50, 100, 200 and 500:
image
In their docs they suggest around 100 points - I'm not sure that's enough to describe the shape of this PN though. For the tests below, I ended up using N=200.

From there, CAJAL computes an intracellular distance matrix based on either the Euclidean or the geodesic distance between the sample points:
image
Both matrices capture the overall structure: dense proximal dendrite and less-branchy, far-away axon. Importantly, using geodesic distance means that our final distances are invariant to bending or flexing - i.e. the same neuron but bent differently in space would have a distance of 0.

As a testbed I took the ~130 uniglomerular PNs from the hemibrain and from one hemisphere of the FlyWire dataset and ran:

  1. CAJAL's Gromov-Wasserstein distance based on the geodesic intracellular matrix
  2. CAJAL's Gromov-Wasserstein distance based on the Euclidean intracellular matrix
  3. Run of the mill NBLAST

Timings

Start to finish CAJAL took 2-3 minutes. NBLAST took just under 2 minutes of which 40s was the transform to a common brain space. With the low number of neurons in the test, the pairwise comparisons made up only a small fraction of the total time - so not sure how this scales to larger datasets.

Accuracy

To assess accuracy I simply asked: "How often is the top hit of the same cell type as the query?"

nblast_correct    0.912
euc_correct       0.200
geo_correct       0.208

At first glance this looks a bit bleak: NBLAST gets us the correct result for 91% of all neurons, CAJAL only in 20%. Maybe the CAJAL distance is at least high whenever the top hit is wrong? Alas, not really:

image

In some sense this is not very surprising: PNs all look somewhat similar. I also noticed that CAJAL tends to confuse neurons from adjacent glomeruli - e.g. DA4l_adPN instead of DA4m_adPN - which, again, is understandable given that it has no notion of absolute positioning.

What if we relax the conditions and ask "How often is the correct cell type among the top 5 hits?"

nblast_correct5    0.976
euc_correct5       0.432
geo_correct5       0.456

This looks better but NBLAST clearly still outperforms CAJAL in this (somewhat unfair) test. It's still important to acknowledge that NBLAST requires a spatial transform to align neurons while CAJAL doesn't which I think is a huge strength. An obvious scenario where CAJAL would likely do better than NBLAST would be in the visual system of the fly - or any system with repeated, columnar organisation for that matter. Should tune the tutorial accordingly!


Edit: increasing the number of points from 200 to 500 doesn't have any noticeable impact on the distances - it does increase the time to calculate them by a lot though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant